Search CORE

83 research outputs found

Constructing ensembles for intrinsically disordered proteins

Author: Fisher Charles K.
Stultz Collin M.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2011
Field of study

The relatively flat energy landscapes associated with intrinsically disordered proteins makes modeling these systems especially problematic. A comprehensive model for these proteins requires one to build an ensemble consisting of a finite collection of structures, and their corresponding relative stabilities, which adequately capture the range of accessible states of the protein. In this regard, methods that use computational techniques to interpret experimental data in terms of such ensembles are an essential part of the modeling process. In this review, we critically assess the advantages and limitations of current techniques and discuss new methods for the validation of these ensembles

DSpace@MIT

PubMed Central

Deep Metric Learning for the Hemodynamics Inference with Electrocardiogram Signals

Author: Ghassemi Marzyeh
Jeong Hyewon
Stultz Collin M.
Publication venue
Publication date: 10/09/2023
Field of study

Heart failure is a debilitating condition that affects millions of people worldwide and has a significant impact on their quality of life and mortality rates. An objective assessment of cardiac pressures remains an important method for the diagnosis and treatment prognostication for patients with heart failure. Although cardiac catheterization is the gold standard for estimating central hemodynamic pressures, it is an invasive procedure that carries inherent risks, making it a potentially dangerous procedure for some patients. Approaches that leverage non-invasive signals - such as electrocardiogram (ECG) - have the promise to make the routine estimation of cardiac pressures feasible in both inpatient and outpatient settings. Prior models trained to estimate intracardiac pressures (e.g., mean pulmonary capillary wedge pressure (mPCWP)) in a supervised fashion have shown good discriminatory ability but have been limited to the labeled dataset from the heart failure cohort. To address this issue and build a robust representation, we apply deep metric learning (DML) and propose a novel self-supervised DML with distance-based mining that improves the performance of a model with limited labels. We use a dataset that contains over 5.4 million ECGs without concomitant central pressure labels to pre-train a self-supervised DML model which showed improved classification of elevated mPCWP compared to self-supervised contrastive baselines. Additionally, the supervised DML model that uses ECGs with access to 8,172 mPCWP labels demonstrated significantly better performance on the mPCWP regression task compared to the supervised baseline. Moreover, our data suggest that DML yields models that are performant across patient subgroups, even when some patient subgroups are under-represented in the dataset. Our code is available at https://github.com/mandiehyewon/ssldm

arXiv.org e-Print Archive

A Structure-free Method for Quantifying Conformational Flexibility in proteins

Author: Arenas Daniel J.
Burger Virginia
Stultz Collin M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2016
Field of study

All proteins sample a range of conformations at physiologic temperatures and this inherent flexibility enables them to carry out their prescribed functions. A comprehensive understanding of protein function therefore entails a characterization of protein flexibility. Here we describe a novel approach for quantifying a protein’s flexibility in solution using small-angle X-ray scattering (SAXS) data. The method calculates an effective entropy that quantifies the diversity of radii of gyration that a protein can adopt in solution and does not require the explicit generation of structural ensembles to garner insights into protein flexibility. Application of this structure-free approach to over 200 experimental datasets demonstrates that the methodology can quantify a protein’s disorder as well as the effects of ligand binding on protein flexibility. Such quantitative descriptions of protein flexibility form the basis of a rigorous taxonomy for the description and classification of protein structure.Massachusetts Institute of Technology (Steve G. and Renee Finn Faculty Innovation Fellowship)Swiss National Science Foundation (Early Postdoc.Mobility Fellowship

DSpace@MIT

PubMed Central

Comparative Studies of Disordered Proteins with Similar Sequences: Application to Aβ40 and Aβ42

Author: Fisher Charles K.
Stultz Collin M.
Ullman Orly
Publication venue: 'Elsevier BV'
Publication date: 01/08/2012
Field of study

Quantitative comparisons of intrinsically disordered proteins (IDPs) with similar sequences, such as mutant forms of the same protein, may provide insights into IDP aggregation—a process that plays a role in several neurodegenerative disorders. Here we describe an approach for modeling IDPs with similar sequences that simplifies the comparison of the ensembles by utilizing a single library of structures. The relative population weights of the structures are estimated using a Bayesian formalism, which provides measures of uncertainty in the resulting ensembles. We applied this approach to the comparison of ensembles for Aβ40 and Aβ42. Bayesian hypothesis testing finds that although both Aβ species sample β-rich conformations in solution that may represent prefibrillar intermediates, the probability that Aβ42 samples these prefibrillar states is roughly an order of magnitude larger than the frequency in which Aβ40 samples such structures. Moreover, the structure of the soluble prefibrillar state in our ensembles is similar to the experimentally determined structure of Aβ that has been implicated as an intermediate in the aggregation pathway. Overall, our approach for comparative studies of IDPs with similar sequences provides a platform for future studies on the effect of mutations on the structure and function of disordered proteins

DSpace@MIT

Elsevier - Publisher Connector

PubMed Central

Intrinsically Disordered Proteins: Where Computation Meets Experiment

Author: Burger Virginia M.
Gurry Thomas
Stultz Collin M.
Publication venue: 'MDPI AG'
Publication date: 01/10/2014
Field of study

Proteins are heteropolymers that play important roles in virtually every biological reaction. While many proteins have well-defined three-dimensional structures that are inextricably coupled to their function, intrinsically disordered proteins (IDPs) do not have a well-defined structure, and it is this lack of structure that facilitates their function. As many IDPs are involved in essential cellular processes, various diseases have been linked to their malfunction, thereby making them important drug targets. In this review we discuss methods for studying IDPs and provide examples of how computational methods can improve our understanding of IDPs. We focus on two intensely studied IDPs that have been implicated in very different pathologic pathways. The first, p53, has been linked to over 50% of human cancers, and the second, Amyloid-β (Aβ), forms neurotoxic aggregates in the brains of patients with Alzheimer’s disease. We use these representative proteins to illustrate some of the challenges associated with studying IDPs and demonstrate how computational tools can be fruitfully applied to arrive at a more comprehensive understanding of these fascinating heteropolymers.National Science Foundation (U.S.). Directorate for Biological Sciences. Postdoctoral Research Fellowship (Grant 1309247

Multidisciplinary Digital Publishing Institute

DSpace@MIT

Directory of Open Access Journals

Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

Author: Alam Ridwan
Chandak Payal
Guttag John
Raghu Aniruddh
Stultz Collin M.
Publication venue
Publication date: 20/07/2023
Field of study

Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.Comment: ICML 202

arXiv.org e-Print Archive

Recommended from our members

ECG Morphological Variability in Beat Space for Risk Stratification After Acute Coronary Syndrome

Author: Guttag John V.
Liu Yun
Morrow David A.
Scirica Benjamin M.
Stultz Collin M.
Syed Zeeshan
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/06/2014
Field of study

Background: Identification of patients who are at high risk of adverse cardiovascular events after an acute coronary syndrome (ACS) remains a major challenge in clinical cardiology. We hypothesized that quantifying variability in electrocardiogram (ECG) morphology may improve risk stratification post‐ACS. Methods and Results: We developed a new metric to quantify beat‐to‐beat morphologic changes in the ECG: morphologic variability in beat space (MVB), and compared our metric to published ECG metrics (heart rate variability [HRV], deceleration capacity [DC], T‐wave alternans, heart rate turbulence, and severe autonomic failure). We tested the ability of these metrics to identify patients at high risk of cardiovascular death (CVD) using 1082 patients (1‐year CVD rate, 4.5%) from the MERLIN‐TIMI 36 (Metabolic Efficiency with Ranolazine for Less Ischemia in Non‐ST‐Elevation Acute Coronary Syndrome—Thrombolysis in Myocardial Infarction 36) clinical trial. DC, HRV/low frequency–high frequency, and MVB were all associated with CVD (hazard ratios [HRs] from 2.1 to 2.3 [P<0.05 for all] after adjusting for the TIMI risk score [TRS], left ventricular ejection fraction [LVEF], and B‐type natriuretic peptide [BNP]). In a cohort with low‐to‐moderate TRS (N=864; 1‐year CVD rate, 2.7%), only MVB was significantly associated with CVD (HR, 3.0; P=0.01, after adjusting for LVEF and BNP). Conclusions: ECG morphological variability in beat space contains prognostic information complementary to the clinical variables, LVEF and BNP, in patients with low‐to‐moderate TRS. ECG metrics could help to risk stratify patients who might not otherwise be considered at high risk of CVD post‐ACS

Harvard University - DASH

PubMed Central

Deep Blue Documents at the University of Michigan

Motif Discovery in Physiological Datasets: A Methodology for Inferring Predictive Elements

Author: Bailey T.
Collin Stultz
Harms S.
Jin X.
John Guttag
Lin J.
Manolis Kellis
Patel P.
Piotr Indyk
Zeeshan Syed
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2008
Field of study

In this article, we propose a methodology for identifying predictive physiological patterns in the absence of prior knowledge. We use the principle of conservation to identify activity that consistently precedes an outcome in patients, and describe a two-stage process that allows us to efficiently search for such patterns in large datasets. This involves first transforming continuous physiological signals from patients into symbolic sequences, and then searching for patterns in these reduced representations that are strongly associated with an outcome. Our strategy of identifying conserved activity that is unlikely to have occurred purely by chance in symbolic data is analogous to the discovery of regulatory motifs in genomic datasets. We build upon existing work in this area, generalizing the notion of a regulatory motif and enhancing current techniques to operate robustly on non-genomic data. We also address two significant considerations associated with motif discovery in general: computational efficiency and robustness in the presence of degeneracy and noise. To deal with these issues, we introduce the concept of active regions and new subset-based techniques such as a two-layer Gibbs sampling algorithm. These extensions allow for a framework for information inference, where precursors are identified as approximately conserved activity of arbitrary complexity preceding multiple occurrences of an event. We evaluated our solution on a population of patients who experienced sudden cardiac death and attempted to discover electrocardiographic activity that may be associated with the endpoint of death. To assess the predictive patterns discovered, we compared likelihood scores for motifs in the sudden death population against control populations of normal individuals and those with non-fatal supraventricular arrhythmias. Our results suggest that predictive motif discovery may be able to identify clinically relevant information even in the absence of significant prior knowledge.CIMIT: Center for Integration of Medicine and Innovative TechnologyHarvard University--MIT Division of Health Sciences and Technolog

DSpace@MIT

Crossref

Hidden States within Disordered Regions of the CcdA Antitoxin Protein

Author: Albert Konijnenberg
Alexandra Vandervelde
Berg J. M.
Collin M. Stultz
Frank Sobott
Gasteiger E.
Jaffe A.
Jelle Hendrix
Neuhaus D.
Remy Loris
Virginia M. Burger
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

The bacterial toxin–antitoxin system CcdB–CcdA provides a mechanism for the control of cell death and quiescence. The antitoxin protein CcdA is a homodimer composed of two monomers that each contain a folded N-terminal region and an intrinsically disordered C-terminal arm. Binding of the intrinsically disordered C-terminal arm of CcdA to the toxin CcdB prevents CcdB from inhibiting DNA gyrase and thereby averts cell death. Accurate models of the unfolded state of the partially disordered CcdA antitoxin can therefore provide insight into general mechanisms whereby protein disorder regulates events that are crucial to cell survival. Previous structural studies were able to model only two of three distinct structural states, a closed state and an open state, that are adopted by the C-terminal arm of CcdA. Using a combination of free energy simulations, single-pair Förster resonance energy transfer experiments, and existing NMR data, we developed structural models for all three states of the protein. Contrary to prior studies, we find that CcdA samples a previously unknown state where only one of the disordered C-terminal arms makes extensive contacts with the folded N-terminal domain. Moreover, our data suggest that previously unobserved conformational states play a role in regulating antitoxin concentrations and the activity of CcdA’s cognate toxin. These data demonstrate that intrinsic disorder in CcdA provides a mechanism for regulating cell fate

Lirias

Crossref

Institutional Repository Universiteit Antwerpen

White Rose Research Online

FigShare